Random Networks with Tunable Degree Distribution and Clustering 



O 

o 

C 
3 



Erik Volz 
Cornell University, Ithaca, NY 1485^\ 
(Dated: February 2, 2008) 

We present an algorithm for generating random networks with arbitrary degree distribution and 
Clustering (frequency of triadic closure) . We use this algorithm to generate networks with exponen- 
tial, power law, and poisson degree distributions with variable levels of clustering. Such networks 
may be used as models of social networks and as a testable null hypothesis about network struc- 
ture. Finally, we explore the effects of clustering on the point of the phase transition where a giant 
component forms in a random network, and on the size of the giant component. Some analysis of 
these effects is presented. 
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I. INTRODUCTION 

Numerous random network models have been proposed 
to replicate important aspects of the topology of real- 
world networks Q, [EHIEIEIEIIS E3, E3, IS d 

In particular, much attention has been paid to the de- 
gree distribution and the clustering coefficient. A great 
deal of progress has been made on network models which 
combine certain degree distributions with some level of 
clustering [H Q El El El It has been an open 
problem to combine these two topologies in the most gen- 
eral way. Is it possible to have a network model which is 
flexible enough to accommodate any combination of de- 
gree distribution and clustering? In this article we pro- 
pose such a model and demonstrate its effectiveness by 
generating networks over a wide range of parameters. 

Random network models have fallen in several broad 
categories. Some models have focused on Monte Carlo 
techniques to reproduce a specific topology Q, 0, El- 
Other models have focused on plausible mechanisms 
for creating a network, such as preferential attach- 
ment, while some models have specific topologies built 
into them (e.g. regular lattices) in order to explicate 
the so-called " small- world" problem 0, E3 El El 
The model proposed here lacks the intuitive appeal of 
mechanism-based models, but also bears the most re- 
semblance to this category. In common with most 
mechanism-based models, we produce our networks by 
growing them from one initial node. Most network 
growth models have been motivated by plausible mech- 
anisms about how nodes enter into a network and form 
links. We find that being able to construct a network one 
node at a time also offers sufficient flexibility to combine 
arbitrary degree distributions and clustering. 

Given a network model which can combine arbitrary 
degree distributions and clustering, it is of great interest 
to explore the effects of these parameters on quantities 
such as the size of the giant component and the point 
of the phase transition where a giant component forms. 
This is true with regard to clustering in particular, as 



so far models capable of interpolating between extremes 
of this parameter have been lacking. In section II I II wc 
explore the effects of clustering on the size of the giant 
component and point of the phase transition. In sec- 
tion ]W\ we present some analysis of our observations. 

Throughout this article we will rely on the following 
definitions: The degree distribution of a network de- 
scribes how many neighbors a node in a network has. 
The probability of a node having degree k in a network 
is described by the degree distribution pk, where p k can 
take the form of any well defined discrete density function 
over the positive integers. Examples frequently employed 
in the literature are 



Poisson: 



Pk 



,k > 



• Power-law. For our experiments, we utilize power- 



,k > 1 



laws with finite cuttoffs k: p^ = Li ^ e _i/, 
where Li n (x) is the nth polylogarithm of x. 

Exponential: p k = (1 - e _1 / A )e~ Afe ,fc > 



• Empirical: The degree distribution is estimated 
from a sample of a network. 

• Gaussian 

The clustering coefficient C describes the proportion 
of triads in a network out of the total number of potential 
triads. Formally, the clustering coefficient is defined: 



C 



3N A 



where is the number of triads in the network and 
is the number of connected triples of nodes. Note that in 
every triad there are three connected triples. 

There is also a measure of local Clustering given by 
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where N/\ {k) is the average number of triads connected 
to vertices of degree k, and ( 2 ) is the number of potential 
triads connected to a vertex of degree k. 
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II. RANDOM NETWORK MODEL 

Introducing clustering into a network with a specified 
degree distribution is a nontrivial problem. Any method 
aspiring to introduce an arbitrary amount of clustering 
into a network must interpolate between two extremely 
different topologies. When clustering is 0%, the method 
must reproduce pure random networks with specified de- 
gree distributions. When clustering is 100%, there is only 
one configuration a network may have: each node must 
be connected to a small clique where every node has the 
same degree, and all of a node's neighbors are connected 
with one another. This challenge is made all the more 
difficult by trying to make the model networks general 
enough to accommodate any desired degree distribution. 

The most obvious way of introducing triads is to simply 
define a rewiring rule whereby links arc swapped between 
nodes so as to introduce triads while leaving the degree 
distribution the same. Such rewiring schemes quickly run 
into problems, as it is impossible to define a rule where 
the number of triads is strictly increasing and the number 
of triads introduced does not max out. The problem is 
that when links are "swapped" among nodes, triads are 
not only created but can be destroyed. For example, 
we have found that such schemes are effective only for 
introducing about 15% clustering into a poisson random 
network. 

Newman |2l| and Guillaume et al. 0| have had some 
success with another approach. These authors define a 
bipartite network of individuals and affiliations. Then 
they project the bipartite network onto a unipartite net- 
work of only nodes and no affiliations by connecting two 
nodes if they share a common affiliation. The distribu- 
tions of affiliation size and the affiliation-degree distri- 
bution of the nodes is chosen in such a way as to pro- 
duce a desired level of clustering. Tuning the degree 
distribution simultaneously has proven more challenging, 
however. While the bipartite projection method may ac- 
tually have the potential to generate pure random net- 
works with tunable degree distributions and clustering, 
so far it's efficacy has only been shown for exponential 
and power-law random networks, and it remains an open 
problem to implement it for arbitrary degree distribu- 
tions. 

Our method works by growing networks. The algo- 
rithm first initializes all nodes with a degree drawn i.i.d. 
from the desired degree distribution. Then the random 
network is constructed by an iterative procedure similar 
to a branching process. The premise is to start from a 
single node and then assign new connections entirely at 
random under the constraint that a certain amount of 
clustering must exist. The algorithm is described in de- 
tail below, and is schematized in figure Q Two example 
networks are shown in figure [5] 

1. Initialize all nodes with a degree drawn i.i.d. from the 
degree distribution 

2. Form a list of "stubs" - connections of nodes which have 




FIG. 1: Overview of the network construction process. The 
first node (far left) is chosen at random. Then neighbors for 
that node are chosen as described in the text. Subsequently, 
neighbors are chosen for the new nodes, but now we have new 
connections formed with nodes two steps away with proba- 
bility C. Triadic connections are indicated with dotted lines. 
This process continues until the waves die out, and a new 
component is formed, or all nodes are exhausted. 



not yet been matched with neighbors. Call this list 
StubList. 

3. Pick a starting node, vq, uniformly at random from all 
nodes. 

4. For each of vo's stubs, choose a new neighbor by pick- 
ing an element vi from the stublist with probability 
Pv 1 \d(v ) a s described in the text. If the new neighbor 
is not 

• the same vertex as vo 

• already connected to vq 

then form the connection. Otherwise, repeat the pro- 
cess until a valid neighbor is found. Add all of the 
neighbors gotten from this process to a list called 
NextWave. 

5. Copy all elements of NextWave to a list called Current- 
Wave. Remove all elements from NextWave. For all 
elements in Current Wave: 

(a) Form a list of all nodes 2 steps away; call this list 
PotentialTriads 

(b) For all stubs which have not been assigned neigh- 
bors 

i. Scan through PotentialTriads. With prob- 
ability C ln P ut , connect to vertex v 3 G 
StubList. Remove element v$ from the Stub- 
List. 

ii. If no neighbors were selected from Potential- 
Triads, select a new neighbor by choosing 
from StubList as above. If the new neighbor 
is not in Current Wave, and if the new neigh- 
bor is not already in NextWave, add them to 
NextWave 
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6. Repeat the last step until Next Wave is empty following 
an iteration. Then, if StubList is empty, the process 
is complete- all connections have been formed. Other- 
wise, start a new component by choosing a new starting 
vertex uniformly at random from those not yet in the 
network. 

Our model has similarities and differences with other 
models proposed in the literature. Like the algorithm of 
Milo et al. [l^, each node is assigned a unique degree 
prior to any edges being formed between nodes. But 
like the model networks of Barabasi Q , Dorogovtsev et 
al. p(l | , and many others, the network is constructed via a 
growth process. The first node is chosen at random, and 
subsequently nodes arc added to the graph by attaching 
them to nodes which still have stubs that have not been 
matched. When the new node forms its own connections, 
it first forms a list of all nodes which are two steps away. 
Then with probability C mput , that node is selected as 
the next neighbor. 

One complicated feature of this algorithm concerns the 
probability of selecting a new neighbor from the stub list. 
In fact, new neighbors cannot be selected uniformly at 
random from the stub list, as clustering implies a certain 
amount of degree assortativity among the nodes in the 
network. For example, a node connected to a degree k 
node has fc — 1 potential triads in common with that node, 
and on average will have C{k — 1) common triads. This 
implies that the node must have on average a degree at 
least equal to C(k — 1). 

Because triads are distributed uniformly throughout 
the network, the number of triads connected to a vertex 
of degree k is distributed binomial ((^C). As noted 
above the number of common triads with a neighbor of 
degree k is distributed binomial(k — 1, C). Let Tij denote 
the number of triads node i has in common with node j, 
and Tji denote the number of triads j has in common with 
i. Of course these two random variables should be equal. 
We can calculate the probability of these two potential 
neighbors as having an equal number of common triads 
as: 

Pij = p ( T y = x )p( t H = x ) 

x=0 

Let qj denote the probability of selecting node j from the 
stub list. Then the correct probability for selecting node 
j as a neighbor is: 

IjPij 

L^dCL Pict 

which is just qj weighted by the probability of the two 
neighbors having a compatible number of triads in com- 
mon. 

In order to sample from this distribution, we use 
Markov Chain Monte Carlo techniques. For a large num- 
ber of iterations we select a new node (3 from the stub 



list, then with probability a a p we accept this new neigh- 
bor, where a is the currently selected node in the markov 
process, and 

_ Pj4X_ 

a ij ~ c 

If (3 is not accepted, we keep a for the next iteration. The 
final neighbor is the node selected at the last iteration. 

It is desirable that our algorithm produce graphs which 
select networks as uniformly as possible from the ensem- 
ble of all networks under the constraint of realizing a 
given degree distribution and clustering coefficient. It 
is difficult to prove that our algorithm is truly unbiased 
in this sense, though our networks do have many of the 
properties of an unbiased random network. The algo- 
rithm produces exactly the right proportion of triads to 
triples in the limit of large graph size. Furthermore, the 
degree of the nodes were chosen as i.i.d. random vari- 
ables, so in the limit of large graph size, the degree distri- 
bution is unbiased too. Furthermore, the triads are uni- 
formly distributed throughout the network as reflected by 
the fact that the local clustering is independent of degree. 
Lastly, when this algorithm is used to produce networks 
with no clustering at all, it produces networks with the 
same statistical properties as true random graphs with a 
specified degree distribution. As shown in figure |3 the 
distribution of component sizes for networks made with 
this algorithm is identical to true random graphs with 
specified degree distribution without clustering. 

It is worth noting that many real-world networks, par- 
ticularly in the bi olog ical realm, have local clustering 
which scales as 1/k j22|. Our model in contrast produces 
constant local clustering, though it may be possible to 
generalize our method to create networks with any de- 
sired schedule of local clustering. 



III. RESULTS 

We have explored the effects of clustering and degree 
distribution over a wide range of parameters. Figures 0] 
through [7| illustrate the effect of clustering on the struc- 
ture of a random networks with poisson degree distribu- 
tions (z = 3) as clustering is increased from to 1.00. 
As clustering increases, nodes tend to disaggregate into 
smaller tightly connected clusters of nodes with similar 
degree. This has the overall effect of decreasing the giant 
component size as clustering is increased. In the limit as 
C goes to 1, we find that the network breaks down into 
many small completely connected cliques with each node 
in a clique sharing a common degree. 

Figure [8] shows the effects of clustering on the size 
of the giant component for a poisson random network. 
Clustering varies from 0.05 to 0.90. The giant compo- 
nent seems to undergo a phase transition at a critical 
level of clustering around C = 0.60. In the next section 
we will find that the critical clustering value is actually 



FIG. 2: Left: Random network with power law degree distribution, k = 15, 7 = 2, C — 0.15. Right: Random network with 
poisson degree distribution, z = 4, C = 0.40. (37)1 
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FIG. 3: Random graphs were generated with an exponential 
degree distribution (A = 1.5) with two algorithms: 1. The 
clustering algorithm described in this text with C — 2. A 
"stub-matching" algorithm as in 0, known to produce true 
random graphs with specified degree distributions. The fre- 
quency of component sizes is illustrated above. 




FIG. 4: Random network on 1500 nodes, poisson degree dis- 
tribution (z = 4), C = 0.00 



C* = 0.618. At this point, nodes suddenly disaggre- 
gate into much smaller, tightly inter-connected groups. 
Similar phase transitions have been observed through- 
out the networks literature, particularly concerning the 
targeted deletion of links and nodes in percolation phe- 



nomena |23| . This algorithm has similar disconnecting 
results without modifying the degree distribution of the 
network. 
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FIG. 5: Random network on 1500 nodes, poisson degree dis- 
tribution (z = 4), C = 0.30 




FIG. 6: Random network on 1500 nodes, poisson degree dis- 
tribution (z = 4), C = 0.60. The image is zoomed on several 
of the largest components. 



Regarding power-law networks (see figure |5J), we note 
the striking tendency for moderate levels of clustering 
to inhibit the formation of the giant component. Be- 
cause the number of potential triads connected to a node 
scales as k 2 , the high degree vertices account for most 
of the clustering. In networks with highly skewed degree 
distributions such as power laws, the high-degree nodes 
must connect to one another in order to realize the re- 
quired number of triads. This has the effect of limiting 
the ability to act as hubs for low-degree vertices, and 
consequently the network disconnects into smaller com- 
ponents. Large components can be preserved under much 
higher clustering with distributions such as the poisson. 
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FIG. 7: Random network on 1500 nodes, poisson degree dis- 
tribution (z = 4), C = 0.97 
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FIG. 8: Size of the giant component versus the clustering 
coefficient in a poisson random network, z = 3. Each point 
represents the average of 40 trials. 



The phase transition also undergoes major changes 
with the introduction of clustering, although this effect 
seems to depend sensitively on the degree distribution. 
In figure ITU1 we see that the phase transition where a gi- 
ant component forms is not significantly affected by the 
introduction of clustering for networks with power law 
degree distributions. In contrast to the poisson random 
networks, there is no sharp phase transition between the 
regime with a giant component and without. This bears 
some resemblance to percolation phenomena, where the 
phase transition disappears for true power-laws and an 
exponent of 2. But in figure ITT1 we see that the point of 
the phase transition was dramatically shifted forward for 
the poisson random network. It is somewhat surprising 
to observe the phase transition being shifted forwards as 
our algorithm features the introduction of degree assor- 
tativity into the network. Previous research has shown 
the tendency of degree assortativity to shift the point of 
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FIG. 9: N=5,000 nodes. Power law with parameters k = 10 
and 7 = 2. Each point represents the average of 40 trials. 
Contrast this with|H| The phase transition is much less sharp 
than for the poisson random networks. 



4000 r 



c 
o 
c 



3000 - 



N 

in 

a 

u 
a 
o 

o 
U 

1 



o 


O 


C=0.15 


A- 


■A 


C=0 






A" 






A" 




A 





..A 



..A 



A 



.A- 



2000 



1000 - 



A 

qo e e 

5 



-0 



c e 



•o-o 



10 

K 



.0O 



15 



FIG. 10: Two random networks are compared over a range of 
parameter values for the power law degree distribution with 
parameters k and 7 = 2. Each point represents the average 
of 40 trials. 



the phase transition backwards j24j. 



ber of neighbors n steps away would decrease to zero on 
average, and the component would be finite in the limit 
of large network size. 

We can use this to approximate the point of the phase 
transition. Formally, we will solve for the point where 



si = s 2 



(1) 



The necessary condition 0) will not quite be a sufficient 
condition in the presence of clustering as described below. 
Thus, our solution will only be a lower bound on the point 
of the phase transition, but in practice, this will serve as 
an excellent approximation. 

For the poisson degree distribution, the average num- 
ber of nodes one step away is equal to the parameter 
of the distribution z, so we have s\ = z. As is well 
known Q , the number of edges emanating from a node 
if we pick an edge at random and follow it to one of 
its ends is also z for the poisson degree distribution. 
Thus, in the absence of clustering we would have sim- 
ply S2 = s\z = z 2 , where S2 is the average number of 
nodes two steps away from a randomly chosen node. 

In the presence of clustering, things become more com- 
plicated. Lets pick a node uniformly at random in the 
network and call this node vq ■ A neighbor of this node, V\ 
will have on average z connections not in common with 
vq. Furthermore, there will be on average Cz triadic con- 
nections between vq and v\ as each of those connections 
has a probability C of being a triad. We can simply 
deduct the triadic connections from S2, so that we have 



s 2 > z 2 -Cz 1 = z 2 (l-C) 



(2) 



There is not equality in equation [2] because there is an 
additional force limiting the number of second neighbors: 
Once two neighbors of vq, say v\ and v[ share a triadic 
connection, it becomes more likely that a node two steps 
away from vq, say 1)2, is a common neighbor of both v\ 
and v[. In fact, such connections exist with probability 
C . Then, the number of connections we should deduct 
from every neighbor at distance two due to common con- 
nections of nodes at distance one is equal to C times the 
average number of triadic connections at distance one, or 
in other words z 2 C' 2 . Thus, we have 



S2 



Cz 2 



C 2 z 2 



z 2 {l 



C-C 2 ) 



IV. PHASE TRANSITIONS 

By giant component we mean a component which in 
the limit of large network size occupies a proportion of 
the nodes greater than zero. The phase transition is a 
manifold in the parameter space of C and the parameters 
governing the degree distribution where a giant compo- 
nent comes into existence. It is a necessary condition for 
a giant component to exist that if we pick a node at ran- 
dom, the average number of neighbors two steps away, So , 
exceeds the number of neighbors one step away, S\ |25j . 
This is intuitive, since if it were not the case, the num- 



We can use this to solve for the critical z* c where a giant 
component forms given a level of clustering C: 

z = z 2 (l-C -C 2 ) (3) 

The non-zero root of this equation is given by 

1 



1 - C - C 2 



(4) 



Note that when C=0, we retrieve the well known result 
that a giant component forms when z = 1 in the absence 
of clustering. Unfortunately, we can only say that this 
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FIG. 11: The size of the giant component is shown vs. z, the 
parameter of the poisson degree distribution, for four levels 
of clustering (C = 0.0, C = 0.15, C = 0.30, C = 0.40). The 
vertical lines indicate the point of the phase transition for 
each level of clustering predicted by equation 



is a lower bound for the phase transition due to that the 
nodes at distance two are not identical to vq . The number 
of outgoing connections from such nodes (to nodes not 
already counted) is less than z — C 2 z on average. 

In figure ^2 w e have plotted the size of the giant com- 
ponent versus the parameter z for several levels of cluster- 
ing. The vertical lines correspond to the phase transitions 



-c 



as given by 0}. We find good agreement between the- 
ory and simulation. 

There is a singularity in 0} where 1 — C — C 2 = 0. At 
this point, C* = 0.618, the giant component disappears 
regardless of the average degree z of the degree distribu- 
tion. C* represents the critical level of clustering that 
can coexist in a network with a giant component. 



V. FINITE SIZE EFFECTS 

During the execution of the algorithm, it occasionally 
happens that a node cannot find a suitable neighbor due 
to the absence of a node left in the network which has 
the correct degree and free stubs to satisfy the degree 
assortativity requirements. This imperfection is due to 
the finite size of the network. In the limit of large size, 
it would always be possible to find a scale such that ev- 
ery node can find just the right profile of neighbors with 
the right degree. There is no perfect way to deal with 
such discprepancies. For the simulations used in this ar- 
ticle, we have simply truncated the degree of that node so 
that it does not have to seek a new neighbor. Even with 
networks of only 5000 nodes, the number of corrections 
made is quite small. 

Figures IT21 and IT51 show the effects of network size and 
clustering on the amount of degree-corrections made by 
the algorithm. Figure ^| shows the effects of cluster- 
ing on the number of corrections made for two networks. 
Note that the total number of " stubs" in the network is 
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FIG. 12: The percentage reduction in the number of "stubs" 
is shown versus the Clustering Coefficient for two networks: 
(i) Poisson degree distribution with parameter = 4, (ii) Ex- 
ponential degree distribution with parameter = 2. N=5000 
for both networks. Each point is based on the average of 20 
trials. 
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FIG. 13: The percentage reduction in the number of "stubs" 
is shown versus the network size. The network has a Poisson 
degree distribution with parameter = 4, C = 0.80. Each point 
is based on the average of 20 trial networks. 



equal to the average degree of the nodes times the popu- 
lation size. The corrections made is shown as the percent 
reduction in the number of " stubs" . Even at 90% clus- 
tering, the poisson random network only undergoes less 
than 5% reduction in its "stubs". 

Fieure lT^l shows the effects of network size on the num- 
ber of corrections made. As expected, the number of 
corrections drops with the number of nodes in the net- 
work. For 7000 nodes and 80% clustering, a poisson ran- 
dom network undergoes less than a 0.5% reduction in its 
"stubs". 
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VI. DISCUSSION 

We have presented a method for generating random 
networks which unite two frequently modeled topologi- 
cal features- clustering and the degree distribution. Our 
model allows networks to be generated over the full spec- 
trum of combinations of these parameters. 

Random network models can serve several important 
purposes. First, they can serve as a null hypothesis about 
the structure of a real- world network. Significant devi- 
ations in the structure of the real-world network from 
a corresponding random graph indicate that there are 
more forces at work shaping the network than are being 
accounted for in the random graph model. These devi- 
ations can then motivate further inquiry into the forces 
shaping real- world networks 0| . 

Secondly, real- world networks are very often of a scale 
that it is impossible to map them entirely. Various net- 
work sampling techniques have been devised to estimate 
features of the network topology in the absence of data on 



the entire network [2(| l28l | . Given reliable estimates 
about network topology, a random network can then be 
generated which reproduces this topology. The random 
network may be used as a stand-in for modeling various 
dynamic models on networks. 

Lastly, the family of random networks we have pre- 
sented here enables the exploration of a huge parameter 
space for models on networks. There are a growing num- 
ber of models which describe dynamic processes explic- 
itly on networks. Examples are models of diffusion pro- 
cesses, such as models of epidemics Efl l 13(1 l3lj. models 
of fads H2,|33, the spread of rumors |34ll35l |. and the mi- 
gration of species among connected habitats [3(| . Other 
models explore reciprocal interactions among nodes em- 
bedded in a network. Examples include spin-glasses, 
kuramoto oscillators, and disordered neural networks. 
There are numerous potential applications for exploring 
the effects of clustering and degree distributions on these 
and other models. 
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